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Abstract 

This paper considers a scenario for multi-input multi-output (MIMO) communication systems when perfect 
channel state information at the transmitter (CSIT) is given while the equivalent channel state information at 
the receiver (CSIR) is not available. Such an assumption is valid for the downlink multi-user MIMO systems 
with linear precoders that depend on channels to all receivers. We propose a concept called dual systems with 

Q i zero-forcing designs based on the duality principle, originally proposed to relate Gaussian multi-access channels 

(MACs) and Gaussian broadcast channels (BCs). For the two-user N x 2 MIMO BC with N antennas at the 

<*" • transmitter and two antennas at each of the receivers, we design a downlink interference cancellation (IC) trans- 

m 

OO ' mission scheme using the dual of uplink MAC systems employing IC methods. The transmitter simultaneously 

'nT ■ sends two precoded Alamouti codes, one for each user Each receiver can zero-force the unintended user's 

rn . 

^~^ , Alamouti codes and decouple its own data streams using two simple linear operations independent of CSIR. 



Analysis shows that the proposed scheme achieves a diversity gain of 2{N — 1) for equal energy constellations 
with short-term power and rate constraints. Power allocation between two users can also be performed, and it 
improves the array gain but not the diversity gain. Numerical results demonstrate that the bit error rate of the 
downlink IC scheme has a substantial gain compared to the block diagonalization method, which requires global 
channel information at each node. 

Index Terms: Multi-antenna systems, broadcast channels, duality, block diagonalization, interference 
cancellation, space-time coding, orthogonal designs. 

I. Introduction 

System performance can be improved through learning the fading coefficients of wireless channels. 
A pilot sequence is inserted at the beginning of each data stream to help the receiver estimate the 
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unknown fading coefficients. Then, techniques such as receive beamforming or coherent detection can 
be conducted to exploit the known channel state information at the receiver (CSIR) before it is outdated. 
Channel state information at the transmitter (CSIT) can also be obtained through feedback channels if 
the channel coherent interval is longer than the feedback delay. Also, when the system is operating in the 
time division duplex (TDD) mode, the forward and the reverse channel coefficients are approximately 
the same due to reciprocity [1]. Then, the CSIR obtained at the reverse channel is used as the CSIT 
for the forward channel. Techniques such as rate adaption, power allocation, and transmit beamforming 
can be performed at the transmitter using the knowledge of CSIT. With respect to the assumptions 
on the channel information, communication systems can be classified into four categories as listed in 
Table I. The first three categories have been extensively discussed in [2], while to the best of our 
knowledge, communication systems with CSIT and no CSIR have been ignored. Generally, obtaining 
channel information at the receiver is easier than obtaining it at the transmitter. The use of System D, 
although not seemingly natural, can be illustrated for transmission in broadcast channels (BCs). 

In a multi-user multi-input multi-output (MIMO) BC, the transmitter simultaneously sends multiple 
independent data streams for each user. The information theoretical aspects of the BC, e.g., the sum 
capacity or the capacity region of the vector MIMO Gaussian BC, have received much attention [10]- 
[12]. The approach to achieve the capacity is through exploiting CSIT using a nonlinear precoding 
method called dirty paper coding (DPC) [13]. The knowledge of the codewords of the previously encoded 
data streams is used to encode a new data stream. For practical systems, such nonlinear precoders are 
discouraged because of its high complexity. A class of linear precoders, known as the zero-forcing (ZF) 
precoder or the block diagonalization (BD) method, is proposed to reduce the complexity [14]. It is also 
known that the low-complexity linear alternative achieves the maximum multiplexing gain [15]. Data 
streams of each user are independently encoded and a ZF precoder is designed for each data stream 
to null out its interference at unintended receivers. The design of ZF precoders is later generalized 
in [16] using minimum mean square error criterions. One limitation of the ZF precoding methods is 
the requirement of perfect CSIT. Also, the number of allowable users is constrained by the available 
antennas at the base station. For a cell with a large number of receivers, multi-user scheduling and 
quantized feedback design are discussed in [17], [18]. The achievable sum-capacity scales linearly with 
log log iiT, where K denotes the number of receivers. 
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There is another practical limitation of the ZF precoders: its implementation incurs substantial ex- 
change of channel information. The transmitter can learn the global CSIT through the feedback channels 
or reciprocal channels, and design the ZF precoders. Note that each user's precoder also depends on the 
channels to the other users and each receiver does not know the other users' channels. The equivalent 
channels, that consist of both the precoders and the fading channels, are still unknown at each receiver. 
Another round of training to learn the equivalent channels is necessary for coherent detection. Otherwise, 
the ZF precoders decouple the BC into parallel non-interfering systems with CSIT and no CSIR. A 
transmission scheme for System D can be applied directly for each system without further training. 
This motivates our work to study transmission with CSIT and no CSIR. 

Most previous designs of ZF precoders are focused on maximizing the sum-capacity or the multiplex- 
ing gain, which needs infinite dimensions for signaling. From the reliability perspective, the diversity 
gain using finite signaling dimensions is of equal importance. A comparison between the opportunistic 
scheduling and the BD transmission using space-time block codes (STBCs) is made in [19]. For a 
i^-user N X M MIMO BC where the transmitter has A^ antennas and each of the K receivers has M 
antennas, allowing only the user with the strongest channel gains to transmit can achieve the maximum 
diversity gain of KMN. For the BD transmission concatenated with STBCs, the achievable diversity 
gain is {N — {K — 1)M)M, which is far less than the maximum possible diversity gain. In this paper, 
we also aim at improving the achievable diversity gains for ZF precoders. 

To achieve these goals, we adopt the idea of duality, that was originally proposed to relate the Gaussian 
multi-access channel (MAC) and the Gaussian BC with perfect channel information at all nodes [10], 
[11]. A new duality principle is proposed to connect two linear systems with ZF designs. We apply this 
principle to transform known transmission schemes in original systems into new schemes for their dual 
systems. We use an uplink MAC system performing interference cancellation (IC) techniques [20], [21] 
as the original system, and propose downlink IC schemes for BCs. The contribution of this paper is 
summarized as follows: 

1) We propose a duality principle to connect two real systems with linear ZF designs. The transmit 
(receive) filters in the first system are used as the receive (transmit) filters in the second system. 
When both systems have the same input power distribution and the channel matrix of the first 
system is transpose of that of the second system, the instantaneous signal-to-noise ratios (SNRs) 
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at the outputs of the receive fikers are also identical for both systems. We illustrate this principle 
by constructing the dual of Alamouti systems, and obtain a new beamforming method called dual 
Alamouti codes. 
2) For the two-user N x 2 MIMO BC, we propose a downlink IC scheme that concurrently sends 
precoded Alamouti codes for each user, and cancels interference and decodes messages blindly 
at each receiver. It achieves a diversity gain of 2(N — 1), rate-one per user, and symbol-by- 
symbol decoding complexity. Compared to the linear BD methods, although it does not achieve 
the maximum rate for N > 2, the proposed scheme requires less number of transmit antennas, does 
not need perfect global CSIR at each receiver, and achieves higher diversity gain that improves 
the bit error rate (BER) performance. Also, the proposed scheme shows superiority in terms of 
diversity gain compared to the BD method even if it is concatenated with a STBC [19], [22]. 
The rest of the paper is organized as follows. In Section II, we present our definition of duality, and 
construct the dual Alamouti codes from the original Alamouti codes for point-to-point MIMO systems. 
In Section III, we propose the downlink IC scheme for the two-user N x 2 MIMO BC, and analyze its 
diversity gain performance. Simulation results are provided in Section IV. Finally, concluding remarks 
are given in Section V and involved proofs are included in the appendices. 

Notation: For matrix A, let A^, A*, and ||A|| denote its transpose, Hermitian, and Frobenius norm, 
respectively. Denote 0^ „, 0^, and !„ as an m x n all-zero matrix, an m x m all-zero matrix, and an 
m X m identity matrix, respectively. The Kronecker product is denoted as ®. We define A/'(0, 1) and 
CJ\f{0, 1) as real Gaussian distribution and circularly symmetrical complex Gaussian distribution with 
zero mean and unit variance, respectively. We also use VKa and J a to denote the real and imaginary 
components of a complex variable a, respectively. The sets of real and complex numbers are denoted 
as R and C, respectively. Similarly, the sets of A^ x M real matrix and complex matrix are denoted as 
^NxM ^^^ ^NxM^ respectively. 

II. Duality for ZF Designs 

We design systems with CSIT and no CSIR from known systems with CSIR and no CSIT. These two 
systems are connected through duality where the roles of transmit and receive filters are exchanged. 
In this section, we present a new duality principle for linear systems with ZF designs. Subsection II-A 
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discusses the duality principle, and Subsection II-B provides an example to construct the dual of the 
Alamouti systems. 

A. The Duality Principle 

We consider two real systems as illustrated in Fig. 1. System 1 has an input vector s G R^^" and an 
output vector d G R^^'" with input-output relationship 

d = sZ + n, (1) 

where Z G R*^^™ denotes the equivalent channel matrix, and n G R^^™ denotes the i. i. d. A/'(0, 1) 
additive white Gaussian noise (AWGN) vector. System 2 has an input vector x G R^^™ and an output 
vector r G R^^" with the input-output relationship 

r = xF + w, (2) 

where F G R™^" and w G R^^" denote the equivalent channel matrix and the i. i. d. A/'(0, 1) noise 
vector, respectively. 

Multiple information streams q, (Z = 1,...,J) are transmitted simultaneously through these two 
systems using transmit and receive filters. For System 1, Stream q is sent using u/ G R^^" as the 
transmit filter and v/ G R^^'" as the receive filter. The filters are normalized as ||u;|p = ||v;lp = 1. 
The input vector s is generated through linear superposition of all streams as s = X) VT^i^i^u where 

1=1:,] 

the coefficient Vi denotes power allocation profile for Stream ci and satisfies the power constraint 
^Vi = V. The receiver calculates y^ = dv^ to extract Stream c^. From (1), the equivalent system at 

l=l:J 

the output of Vfc can be expressed as 



Vk 



IYI V^u,QJZv^ + nvJ. (3) 

The filters are designed based on ZF constraints. In other words, the output of v^ contains no component 
of the stream c; for I ^ k. Mathematically, we have 

uiZv'^ = 0,l^k. (4) 

Thus, the equivalent system in (3) can be simplified as yk = \/^CfcUfcZvJ + nvj. The SNR at the 
output of Vfc can be expressed as 

SNRorig,,. = m|ufcZvJf. (5) 
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For System 2, Stream q is sent using the transmit filter v^ (the receive filter of System 1) and the 
receive filter u; (the transmit filter of System 1). The input vector is generated using the same power 
allocation profile as System 1 by x = J2 VT^i^ici- The receiver uses the output at the receive filter u^ 

l=l:.J 

to extract Stream c^. The system equation can be expressed as 



\^=l:J / 



rk=\ y . ^Vi^ici Yv^l + wu,\ (6) 



The receiver treats interfering streams as noises, and the equivalent output SNR at the filter u^ is 



T||2 



The first term of the denominator represents the power of interference, while the second term represents 
the variance of the equivalent noise wu^. In what follows, we define duality. 
Definition 1: System 2 is called the dual of System 1 with ZF designs if: 

1) System 1 uses u; and v; as transmit and receive filters, respectively. System 2 uses v; and u; as 
transmit and receive filters, respectively. 

2) Both systems have the same power allocation profile Vi. 

3) The channel matrices are related by F = Z^. 

4) The filters u/ and v^ satisfy the ZF relationship in (4). 

Proposition 1: Both the original and the dual systems have the same SNR at the output of the kth 

receive filter. 

Proof: When F = Z^, due to the ZF relationship in (4), the power of interference in (7) is 
Pi||v;Z'^uJ|p = for I ^ k. Thus, (7) can be simplified as SNRduai,*: = '^fcUvfcZ'^u^p. From (5), we 
have SNRorig.fc = SNRduai.fc- ■ 

B. Application in point-to-point MIMO systems: Dual Alamouti codes 

In this subsection, we present an example of applying the duality principle to obtain the dual of 
Alamouti systems [6]. For a 2 x A^ MIMO system with two antennas at the transmitter and A^ antennas 
at the receiver, the Alamouti system sends two symbols si and S2 in two time slots. The symbols 
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are drawn independently and uniformly from a normalized constellation S with finite cardinality. For 
simplicity, we assume 5 to be a PSK constellation. The transmitted matrix has an Alamouti structure 



(8) 



Sl S2 

where P denotes the transmitted power per time slot. Let us denote yu as the receive signal at time 
slot t and receive Antenna i. The receiver calculates the negative conjugate of the receive signal in 
time slot 2 and stacks the signals into a 2N x 1 vector y = [yu — y^i ■ ■ ■ yiN — y^Nl ■ The receiver 
further decouples these two symbols by calculating y = j^y. The matrix G G C^^^ denotes the 
channel matrix whose (j, i) entry gji denotes the channel coefficient from transmit Antenna j to receive 
Antenna i. The matrix G G c2Arx2 ^^ composed of the 2x2 equivalent Alamouti channel matrices G, 
at Antenna i. 



G, 



gii 92i 

~92i 9li 



,G 



G^ 



G 



TV 



(9) 



Note that y G C^^^. Symbol-by-symbol decoding of Sj is performed based on the jth component of y. 
In what follows, we use duality to obtain the dual of Alamouti systems. Since symbol conjugates are 
excluded from the definition of duality and Alamouti codes send conjugates of symbols, we separate the 
real and imaginary components of one symbol and stack them into a 2 x 1 vector. Then, the conjugate 
of this symbol can be equivalently written as a linear transformation of this vector using a 2 x 2 matrix 
diag [1, —1]. For this reason, we also need to separate the real and imaginary parts of the receive signals. 
Let us denote the jth component of y as yj, which is the output after symbols are decoupled at the 
receiver. We stack the real and imaginary parts of yj into a 1 x 4 vector [9^ i/i J i/i 9^ {/2 J 2/2] • For the 
Alamouti system, the equivalent equations after symbol decoupling can be expanded as 



^yi 


1 


91 Sl 


1 




^^2 


= >/? 


Jsi 

9^S2 
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G 04_2Ar 
04,2Af G 




ni 
n2 


^?/2 




OfS2 


l2®G 


n 



V^ 



(10) 



where U G E^^^ G G R^^^iv^ y g E^^^^, and n G E^^^^ denote the transmit processing matrix, the 
equivalent fading channel matrix, the receive processing matrix, and equivalent noise vector, respectively. 
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The matrices G and rij can be expressed as 

-3gii ^gii ■■■ -3giN ^9in ^ -iT 

-3921 ^921 ■■■ -'3g2N ^92N 

where nu denotes the CJ\f{0, 1) noise at time slot t and receive Antenna i. The transmit processing matrix 



G 



,nt 



UKriti 3na 



UKUtN '^UtN 



U 



I4A 
GG 



and the receive processing matrix V = ^^$)k- with D = diag [l2Ar, In ® diag [—1, 1]] and 



. The matrices A and G can be expressed as 



A 



1 

0-1 

-10 

10 



,G 



-9^5f2i '3 921 

-3921 -^921 

fRgii -3911 

3gii ^gu 



-^g2N 3g2N 

-3g2N -^g2N 

DlgiN -3 giN 

3 giN ^giN 



(11) 



For the transmit processing matrix U, each row represents one dispersion matrix of Alamouti codes [2]. 
Inside the receive processing matrix V, the matrix D contains the calculations of the negative conjugate 
of the receive signal in time slot 2, while the matrix -jt^ decouples these two symbols. 

Note that the system equation in (10) resembles the original system with a 1 x 8 input vector, a AN x 1 
output vector, and a channel matrix I2 ® G. Four data streams are transmitted: ci = 9^si, C2 = 3 si, 
C3 = D\s2, and C4 = 3 S2. The streams are sent using the rows of -^ as the transmit filters (the transmit 
filters are normalized by dividing by -\/2) and the rows of V as the receive filters. Let n = 8, m = AN, 
and Z = I2 ® G. The transmit power h V = AP per transmission, and equally allocated among four 
streams. Also, it can be verified that these filters satisfy the ZF conditions in (4). Using the definition 
of duality, we obtain the system equation for its dual system as 



n T 



Jri 

9=1 f2 

Jr2 



3 si 

J S2 



V 



G* 



2NA 



OoAT 4 G 



V2 



w 






(12) 



where the rows of V are the transmit filters and the rows of -^ are the receive filters. Note that 
V depends on the channel information, while U is independent of the channel information. The dual 
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system uses CSIT and does not require CSIR. For the dual system, denote the channel path from transmit 
Antenna i to receive Antenna j as hij. From the definition of duality, we have Qji = h*-. Replacing 
gji with h*j in (12), combining the real and imaginary components, and converting back to matrix 
expression, we obtain the dual Alamouti codes. 

More specifically, the dual Alamouti system is described as follows. The transmitter has N antennas 
and the receiver has 2 antennas. Denote the N x 2 channel matrix as H whose {i,j) entry is hij. Two 
symbols Si and S2 first form a 2 x 2 Alamouti block code. Then, a scaled Hermitian of H is used as 
the precoder. The 2 x N transmitted matrix can be expressed as 



X 



H* 



IHI 



(13) 



Sl S2 

The transmit power can be verified to be P per time slot, and powers are equally allocated between the 
two symbols. The (t, i) entry of X is sent at Antenna i and time slot t. The receive signal at Antenna 
j can be expressed as 



r2j 



Xh, 



Wij 
W2j 



P 



IHI 



Sl S2 



'2 -^1 



H*h, 



W2j 



P 



IHI 



Sl S2 

— Q* Q* 






+ 


Wij 
_W2j_ 



(14) 



where hj denotes the channel vector to receive Antenna j, i.e., the jth column of H, and Wtj denotes 
the CJ\f{0, 1) AWGN at receive Antenna j and time slot t. Take conjugate of the receive signals in time 
slot 2. We can stack receive signals into a 4 x 1 vector as 

h*hi h*hi 

h^hs -h*hi 

hlh2 h2h2 

h2h2 — h2hi 



rii 

^21 

ri2 



22 



p 



IHI 
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Wii 




Sl 
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W^21 




. '^2 . 




U!l2 
. ^22 _ 



(15) 



Without CSIR, the receiver decouples si and S2 as 

rii + r^2 _ ^ 



rx = 



r-i 



V2 
V2 



a/2||H| 

Vp 



V2||H| 



(sih*hi + ssh^hi + (-S2)h;hi + sih^ha) 
(sih*h2 + S2h2h2 - (-S2)h*hi ^ Sihih2) 



Wii + ^29 
Wl2 - W2I 

V2 



IHIIsi 



-A/fl|H||.2 



Wll + Wo 



22 



V2 
W12 — W21 



V2 



, (16) 
. (17) 
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The power of the equivalent noise is normalized by dividing by \pl. To decode Sj, the receiver performs 
the maximum-likelihood (ML) decoding for the equivalent systems in (16) and (17) as 



mm 

sg5 



f,- — \\ — H s 



2 



P 9 

max 29^ {fjS*)- a/-||H|| |s|\j G {1,2}. (18) 

se5 V 2 



Since iS is a PSK constellation, the second term ^P/2||H|| |s|^ has a constant value for all points in S 
and thus can be ignored^ Eq. (18) can be further simplified as max 91 (r,s*). This shows that the ML 
decoding can be performed without CSIR. Symbol-by-symbol decoding complexity is also obtained, 
i.e. si and S2 can be decoded separately. 

The dual Alamouti codes need perfect CSIT and no CSIR, and are belong to System D in Table L 
The performance of dual Alamouti codes is described next. Two symbols are sent in two time slots from 
(13). Thus, the rate of the dual codes is one. From (16) and (17), the receive SNRs of both symbols are 

y ||H||, which is the same as that of the original Alamouti systems. This confirms with Proposition 1. 
Since the ML decoding of dual Alamouti codes can be conducted without CSIR, both Alamouti codes 
and its dual codes achieve the same array gain and diversity gain. 

III. The Downlink IC Scheme for a Two-user MIMO BC 

This section focuses on the two-user MIMO BC system with CSIT and no CSIR. Using our dual 
Alamouti code, each user can transmit in an orthogonal time slot to avoid interference. However, this 
time division multi-access (TDMA) method sacrifices the transmission rate. We propose a concurrent 
transmission scheme for these systems using the duality principle obtained in Section II. In the uplink 
two-user MAC, the IC scheme in [20], [21] concurrently transmits both users' symbols through Alamouti 
codes and linearly zero-forces the interference at the receiver. It achieves full transmit diversity and 
user-by-user decoding complexity. Since it satisfies the conditions for the original system described in 
Definition 1, we use the uplink IC scheme as the original system and construct its dual system for 
BCs. In Subsection III-A, we review the uplink IC scheme. Subsection III-B presents the downlink IC 
scheme. Further discussions on power allocation and diversity gain are provided in Subsection III-C. 

We describe our transmission schemes for a two-user N x 2 BC with N antennas at the transmitter 
and two antennas at each receiver. The generalization to a BC with any number of users and antennas 

'Generalizing Dual Alamouti codes for other constellations such as QAM can be found in [23] 
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is straightforward using the duality principle. The channel coefficients are i. i. d. CA/'(0, 1) distributed 
and known perfectly at the transmitter but not at the receivers. Also, we assume that the channels are 
block fading and are constant during each transmission. We adopt the same notation as in Section II 

(k) 

and add a superscript k as user index. Thus, g)^ denotes the channel coefficients from transmit Antenna 



■)ji 



Ak) 



j of User k to receive Antenna i in the original MAC, and h-- denotes the channel coefficient from 
transmit Antenna i to receive Antenna j of User k in the dual BC. 



A. The Uplink IC Scheme 

We review the IC scheme for a two-user 2x N MAC with two antennas at each user and A^ antennas 
at the receiver [20], [21]. The system diagram is illustrated in Fig. 2. User k sends two independent 
symbols s[ ' and S2 to the receiver encoded by Alamouti codes using power j per time slot per user. 
Both users transmit concurrently. Then, power is equally allocated between two users and the total 
power of the network is P. Denote yu as the received signal at time slot t and Antenna i. The receiver 



stacks the receive signal into a 2N x 1 vector as y = [yu 
system equation of y can be expressed as 



y = \hr 
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(19) 



where n 



nil 



n: 



i(fc) 



■21 



riiN — ^2Af] denotes the equivalent 2N x 1 noise vector and G^^ denotes 
the equivalent Alamouti channel matrix from User k to Antenna i as that defined in (9). 

The receiver decouples four symbols through two steps. In Step 1, symbols of each user are separated 
using ZF. Two 2(A^ — 1) x 2N ZF matrices are formed as 
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(20) 
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where the receiving filter T}-^^ is used to zero-force User /c's symbols, i.e. TJ-^^Q,'^^^ 
the other user's symbols. The IC process can be conducted as 



V 4 



.W 



^2 



Z(^)n, 



0, and extracts 



(21) 



where the index fc = {1, 2} \ /c, denotes the user other than User k. The vector y^'^) contains only the 
symbols of User k. In Step 2, two symbols from the same user are further decoupled. Note that the 
2x2 submatrices contained in the resulting equivalent channels Z'^-^G^*^) have Alamouti structures due 
to the completeness of Alamouti matrices under addition and multiplication. The receiver constructs the 
2 X 2(A^ — 1) symbol separating filters for User k as 



m 



a' 



(fc) (z(^)Q{k) 



where the coefficient a^''^ normalizes the equivalent combined filter, i.e.. 



j|p(fe)2(fe).|j2 



(22) 



1. Therefore, we 



have a^*-'^ 
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||Q(fc).Z(fc).Z(fe) 



r. Two symbols are separated through 
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|2{fc)Q(fc)||2 



.(fc) 



.(fc) 



F^Z^n, 



(23) 



It can be observed that the resulting channel for each symbol is a scalar, and each entry in the equivalent 
noise vector F*^^)Z®*n can be verified to be i. i. d. CJ\f{0, 1). Thus, symbol-by-symbol decoding can be 
conducted based on the Ith. entry of F'^'^'^y'''^^ to recover sj . In total, four symbol-by- symbol decoding 
procedures are needed at the receiver to recover the transmitted symbols. 

B. The New Downlink IC Scheme 

In this subsection, we propose our downlink IC scheme, which is the dual system of the uplink IC 
scheme. Since the details of the transform from the original system to the dual system are involved 
and similar to that in Subsection II-B, we omit the construction steps, and directly present the new 
transmission scheme. 

The system diagram is shown in Fig. 3. In the downlink system, we use the user indices to denote 

(k) (k) 

receivers rather than transmitters. The transmitter sends two symbols s\ and S2 to User k, encoded 
through Alamouti codes. The process of precoding has two steps. The first precoder E^'^^ is called the 
symbol separating precoder, and reuses the structure of symbol separating filters in (22). Note that F^'^^ 



DRAFT 



13 



Ah) 



is a function of gj- and g^j'. The 2 x 2{N — 1) filter E^'^^ is obtained by replacing every g^'^' in F*^*^) 



(k) 
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:>]l 
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with ni^ for k G {1,2}. The output from the symbol separating precoder can be written as 
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(24) 



/i-^ /i-2 , denoting the 1 x 2 



It allows each receiver to decouple its own symbols without knowing CSIR, as will be shown later. 
The second precoder is called the user separating precoder. Let h- 
channel vector from transmit Antenna i to User k. We design the 2(A^— 1) x A^ user separating precoder 
for User k as 



B® 



02,1 



02,1 

l|h^-'lP 



02,1 02,1 



02,1 
02,1 

"jv-i 

\\^f-,\? 



u(*L) 


* 




112 



llh'^'l|2 
II"JV II 



v,{fc)* 
l|1jv II 



(25) 



It can be observed that each user's separating precoder depends on the channel coefficients to the other 
receiver. The second precoder allows the undesired receiver to cancel the interference. The transmitted 
2x N matrix is linearly generated by multiplying the Alamouti code matrix with the two precoders and 
adding up users' signals 

'P 



X 



X(i)b(2)+X(2)b« 



(26) 



The transmitted power in two time slots is E tr (X*X) = 2P. Therefore, the system satisfies the 



Jfc) 



short-term power constraint P per time slot. Also, it can be verified that 



E tr ( :^x('=>b(^)b(^)*x('=M =p, ke {1,2}, 



(27) 



i.e., power is equally allocated between the two users^. 



Let us denote the receive signal at time slot t and Antenna j of User k by r^ ' and i. i. d. CA/'(0, 1) 



(k) 

noise by w^j . Since channels are Rayleigh flat fading, we have 



tj 



Yl xtih[ 



{k) (fe) 



(28) 



i=l:N 



Since power is equally allocated between the two users in the original uplink MAC systems, the dual systems also have equal power 
allocation. With CSIT, power allocation is possible, and will be discussed later. 
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where xu denotes the (t, i) entry of the transmit block X. User k decouples symbols and cancels 
interference in one step. Decision variables for s['' and s^ are constructed as 



^W 



-Sk) 



Jk)* 

22 ) 

(k)* 
'12 '21 ' 



11 



(29) 
(30) 



respectively. The operations are independent of CSIR, and the receiver does not need to learn the 
channel information. In the following proposition, we show that interference is cancelled and symbols 
are decoupled. 

Proposition 2: With the operations in (29) and (30), the signals of the other user, i.e.. User (fc), are 
zero-forced. Further, the symbols s{ and S2 are decoupled. 

Proof: See Appendix V-A. ■ 

We emphasize that (29) and (30) are similar to (16) and (17) without noise normalization. This 

is because the original systems of both the dual Alamouti codes and the downlink IC scheme have 

Alamouti codes at the transmitter. 

From the proof of Proposition 2, we can rewrite the equivalent system equation at User k as 






-, T 






(k) 



-, T 



|B(^)h('=)||2 + wW, 



(31) 



where the notations of f3^''\ B^-\ 'H.^'^\ and w^'^) can be found in the appendices in (42), (39), (40), 

(k) (k) 

and (41), respectively. In what follows, we discuss the decoding of s\ and S2 . From (41), it can be 
verified that the components in the equivalent 1x2 noise vector w^'"'^ are i. i. d. CAf{0, 2) distributed. 



(fe) 



The ML decoding for symbol si can be conducted as 



mm 



-Jk) 



Ak) 



^^^'^B^H^lpsp) 



2 2 



Ak) 



22" " 



M 



(32) 



When Si is drawn from a PSK constellation, the second term is constant for different points in 

the constellation. The ML decoding can be simplified as max9^{f^ sj }. Thus, it can be performed 

(fc) 

without knowing CSI. For other constellations, a decision-feedback method can be used to estimate the 
coefficient w y^HB^H^*^)!!^ from the previously decoded symbol. The details of a similar estimation 
can be found in [23]. 
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C. Discussion 

In this subsection, we discuss power allocation and performance analysis in terms of the diversity 
gain and symbol rate. 

First, we study power allocation. The transmitted matrix in (26) assumes equal power allocation 
between two users. With CSIT, we can further allocate power between two users to compensate the 
channels in deep fading. We can rewrite (26) as 

X = 7? (ciX(i)b(2) + c2X(2)b(i)) , (33) 

where Ck denotes the power allocation coefficient of User k. To satisfy the sum power constraint 
tr (X*X) = 2P, we have cl + c^ = 2. From (31), we can calculate the output SNR for each symbol as 

SNRC^) = ( ^/^('^^Cfc IIb^^^hC^)!! J = :^tr fH(^>m> (b^^^B®*)"' mwA . (34) 

V / ^ , ' 

h 

To optimize the decoding error probability, the transmitter distributes the total power to maximize the 
smaller of SNR'-'^^ The optimization problem can be equivalently written as 

f P P 

max min ( — c%^ , — Cnh 

\8 ^ ' 8 ^ 

s.t. cl + cl = 2. (35) 

The above problem can be converted to a linear optimization problem. Using Lagrange multiplier 
methods, the solution is cl = j;-^- The resulting output SNR can be calculated as 

SNR(i) = SNR(2) = -P^. (36) 

4(61 + 62) 

The proposed downlink IC scheme satisfies the short-term power constraints in (27) and the decoding 
delay of two time slots. With perfect CSIT, the short-term behavior of the decoding error probability 
for any space-time block coding system is subject to finite diversity gain [24]. We provide the diversity 
gain analysis in the following theorem. 

Theorem 1: For a two-user N x 2 BC system with perfect CSIT, no CSIR, and equal-energy 

constellations, the downlink IC scheme achieves a diversity gain of 2 (A^ — 1) for both the equal power 

allocation and the optimal power allocation given in (35). 

Proof: See Appendix V-B. ■ 
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The IC scheme in the two-user 2x A^ MAC system can achieve diversity gain 2(A^— 1) [25]. This theorem 
says that the downlink IC scheme can achieve the same diversity gain as the uplink IC scheme. Intuitively, 
this can be predicted from Proposition 1 because these two dual systems have the same distributions on 
the instantaneous receive SNRs with equal power allocation. Since diversity gain depends on the outage 
probability of instantaneous receive SNR [26], these two systems achieve the same diversity gains. 
Further power allocation for the downlink scheme cannot degrade the resulting diversity gain. Hence, 
the downlink scheme with the optimal power allocation can also achieve a diversity gain of 2(A^ — 1). 
From Theorem 1, we can also observe that the full receive diversity gain of 2 is achieved, and the 
transmit diversity gain is A^ — 1. Compared to the method concatenating STBCs and BD transmission, 
with the receive diversity of 2 and the transmit diversity of A^ — 2 [19], our proposed scheme has a 
higher transmit diversity at the same transmission rate. 

Finally, we end this section with a discussion of the rate. Each user equivalently receives an Alamouti 
code in two time slots. Thus, the symbol rate is 1 symbol/channel use/user, and the throughput for the 
whole network is 2 symbols/channel use. When the transmitter is equipped with two antennas, i.e., 
N = 2, our proposed scheme achieves the maximum multiplexing gain. When the transmitter has more 
than two antennas, our scheme does not achieve the maximum multiplexing gain min{A^, 4}, yet still 
has a rate benefit compared to the TDMA orthogonal methods. 

IV. Numerical Results 

We show the simulated BER performance of the proposed dual Alamouti codes and the downlink IC 
scheme, and compare their performance with related schemes in the literature. Figures in this section 
have the average receive SNR, measured in dB, as horizontal axis and BER as vertical axis. Since noises 
and fading channels are normalized, the average receive SNR is identical to the transmit power. 

First, we compare the dual Alamouti codes with the original Alamouti codes, differential transmission 
[5], and singular value decomposition (SVD) transmission using the largest eigen-direction. All schemes 
have a rate of one and achieve full spatial diversity for point-to-point MEMO systems. But the channel 
information requirements are different. The dual Alamouti codes need CSIT but no CSIR, while the 
original Alamouti codes need CSIR but no CSIT. For differential transmission, neither CSIT nor CSIR 
is required. The SVD scheme needs both CSIT and CSIR. Fig. 4 shows the simulated BER performance 
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in a 2 X 2 MIMO system with QPSK and 8PSK constellations. It can be obso-ved that the dual Alamouti 
codes achieve the same BER as the original Alamouti codes. This can be confimied by Proposition 1. 
The SVD scheme outperfomis the other three transmission schemes due to its use of both CSIT and 
CSIR. The performance gap between the SVD scheme and the dual codes is approximately 2.5 dB. 
Compared to the SVD scheme, the dual codes trade BER performance for less resources to learn CSIR. 

Next, the proposed downlink IC schemes, with both equal and optimal power allocation, are compared 
with the BD methods using STBCs [19], [22] and an opportunistic TDMA scheme. Note that for a two- 
user N X 2 BC, the BD method requires that the transmitter has at least four antennas. Alamouti codes 
are used on top of the ZF precoder to improve the diversity gain and reduce decoding complexity 
[22]. Such a BD method achieves 1 symbol/channel use/user and has the same symbol rate as the 
downlink IC scheme. Global CSIR is required at each receiver to decouple its two symbols carried 
in the Alamouti codes for each transmission. To compare with a TDMA scheme with CSIT and no 
CSIR, we propose an opportunistic TDMA scheme that assigns orthogonal time slots for each user and 
schedules the user with the stronger norm of channel coefficients to transmit using dual Alamouti codes. 
The opportunistic TDMA scheme has averagely only 0.5 symbol/channel use/user, and a higher-order 
constellation is needed to compensate the rate loss. Also, it requires a longer decoding delay compared 
to both concurrent transmission schemes. 

Figs. 5 and 6 exhibit the BER performance at rate=l bit/channel user/user and rate=2 bits/channel 
use/user, respectively. In Fig. 5, BPSK is used for the downlink IC schemes and the BD methods, 
and QPSK for the opportunistic TDMA scheme; in Fig. 6, the downlink IC schemes and the BD 
methods use QPSK modulations, and the opportunistic TDMA scheme uses a 16-QAM constellation. 
We first compare the downlink IC scheme using equal power allocation with its optimal power allocation. 
From both figures, there is approximately 1 dB array gain improvement for A^ = 3 and A^ = 4. The 
improvement for A^ = 2 is about 0.5 dB. The observation is consistent with Theorem 1 that power 
allocation improves the array gain but not the diversity gain. Next, we compare the downlink IC scheme 
with the BD method. From Figs. 5 and 6, the downlink IC scheme with A^ = 4 outperforms the BD 
method with A^ = 4 in the entire simulated SNR regime. Also with A^ = 3, the downlink IC scheme 
outperforms the BD method with A^ = 4 for SNR > 10 dB. It can be observed that the proposed 
downlink IC scheme achieves a higher diversity gain compared to the BD method. 
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Finally, we compare the downlink IC scheme with our opportunistic TDMA scheme. In Fig. 5, 
the opportunistic TDMA scheme has substantial gain over the downlink IC scheme: the opportunistic 
TDMA scheme with N = 2 can outperform the downlink IC scheme with A^ = 4. This is because the 
opportunistic TDMA scheme exploits the multiuser diversity gain of 2N x 2 = AN. It also explains the 
gains in Fig. 6 for SNR> 11 dB. On the other hand, it can be observed in Fig. 6 that for A^ = 3 and 
A^ = 4, the downlink IC scheme has approximately 1 dB gain over the opportunistic TDMA scheme for 
SNR< 10 dB. The improvement is because the downlink IC scheme has a higher symbol rate compared 
to the opportunistic TDMA scheme. 

V. Conclusions 

This paper investigates designs of communication systems with CSIT and no CSIR. We show such 
scenarios arise for concurrent transmissions in BC systems when users do not know the channels of 
other users. A duality principle has been proposed for systems with ZF designs. The duality principle 
connects the systems that know CSIR but not CSIT with the systems that know CSIT but not CSIR. We 
show an example to construct the dual system of the Alamouti codes, and propose the dual Alamouti 
codes for point-to-point MIMO systems. For the two-user downlink MIMO BC, we consider an IC 
scheme for the uplink MAC as the original system, and derive its downlink dual system, called the 
downlink IC scheme. The transmitter uses CSIT to design linear precoders and each receiver cancels 
interference and decouples its own data streams using two linear operations independent of CSIR. Power 
allocation between two users are also discussed. For a two-user A x 2 BC, the downlink IC schemes 
achieve a diversity gain of 2( A — 1) at rate 1 symbol/channel use/user with both equal and optimal power 
allocations. The proposed schemes trade higher diversity for rate compared to the full-rate BD scheme. 
Simulation results demonstrate their superior BER performance over the BD methods concatenated with 
STBCs, which require global CSIR at each receiver. 

Appendices 

A. Proof of Proposition 2 

We prove this proposition for User 1 only. Since the network is symmetrical, i.e., the user indices 
can be exchanged, similar results can be obtained for User 2. We first show that the signals of User 2 
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is zero-forced at User 1. Since the 2 x 2{N — 1) matrix E'^'^^ has the same structure as F*^'^\ the 2x2 
submatrices of E*^'''' also have Alamouti structures. Then, X*^*^-* in (24) is composed of 2 x 2 matrices 
with Alamouti structures due to the completeness of Alamouti structures under multiplication. We can 
represent X'^'^^ by 



X(fc) ^ 



/^C^) 



;s('=) 



z.{k)* ~{k)* 
X21 J-ii 



~{k) 
'^l(Af-l) 

~(k)* 



~{k) 

~(k)* 
X 



2{N-1) -^1(^-1) 



,fce{i,2}. 



The received signals rj^^ and r22 can be expanded as 

T 



.(1) 



'?E 



z.{k) 



^C^) 



l(JV--l) 



X 



~{k) 
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i(fc) 



,(i)||2 







,(fc)' 



,(i=)||2 
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-,(£)• 



II h<-^' 


1)2 



"-11 

"^21 






Wll, 



(37) 



Denote h^^) 



,(1)* 
'22 



h 



(1) 
11 



Ex 



t(fc) 



Ilhj-'P 



,(1)* 
'■12 



"'Nl "'N2 



hi-'P 






i|h<^*IP 



(iV-l)2 

||h 
h 



(k) 112 ||h(ii> 

N-lll ll"lV 

(i) ,(fc) 

(w-i)i "iVl 



"'12 



,(1)* 
'■N2 



H 



(fc) 



and 

Ak) 
""'42 "'il 



"jl "i2 



which has an Alamouti structure. It follows that 



^'22- 



(38) 



^(1) 



„(i) 
11 



22 



'£^xWB(^)ha) + ^w + 



w. 



(1)* 
22 1 



(39) 



where the 2(A^ — 1) x 2A^ matrix B^-^ has the same structure as the ZF matrix Tj^'^'^ in (20) and replaces 



2Q{fe)* 



each 2x2 matrix ^ — 
Similarly, we have r''^^ 



with 



&(fe) 



H 



' 12 



'1 '2 



llhp'P 



for z = 1, . . . , iV. Further let h^) = /^i^' /liV* • • • /ij^^ W^{ 



'21 



(1) _ ^(1)* _ . I PY^^{k)-Q{k)Y^{i) 



^i(fe)B® 



.(1) 



(1)* 



w 



yd)* 



Combine these two equations as. 




x(i)b(2)h(i)+w(i), 



(40) 
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where w^^^ denotes the equivalent 1x2 noise vector 



.(1) 



„(1) , „„(!)* 



„(1) „„(!)* 



(41) 



Note that H^^) 



vv(i)* -i^C^)* 



, whose 2x2 submatrice denotes the equivalent Alamouti channel 



matrix from Antenna i. The signal of User 1 is cancelled because B^^^ zero-forces H(^\ 

In what follows, we show that the symbols s[ and ^2 are decoupled. From the structure of the 



symbol separating filter, we have x^^^ 



.(1) „(i) 



E^^\ Note that E^^^ has the same structure as F^^^ 



but replacing g^*^' with hl^ for k G {1, 2}. From (22), we have 



Ed) = /j(i) (^B^^mi^' 



(42) 



where (3^^^ 



V2 



||H(i)*B(2)*B(2)|| 



Eq. (40) can be further written as 



Fr(i) 



Fr(i) 



,(1) 



,(i) 



/3(^' (l 



B(2)h(i)1 B(2)h«+w(i) 



2 2 



.(1) 



.(1) 



1B(2)hW||2 + w(i). 



The second equality is achieved because the 2x2 submatrices of B^^^^H*^^) have Alamouti structures. 

B. Proof of Theorem 1 

When the system uses an equal-energy constellation, the ML decoding can be conducted without 
CSIR. Then, the diversity gain performance depends only on the output instantaneous receive SNR. 
A method analyzing the diversity gain based on instantaneous receive SNR is presented in [26]. The 
techniques in [26] focus on the first-order exponent of the outage probability of the instantaneous receive 
SNR. For equal power allocation with c^ = 1, the resulting SNR can be rewritten as SNR^'^-' = ^ 
from (34). Note that h^ has the same expression as the instantaneous normalized receive SNR of the 
uplink IC scheme that achieves a diversity gain of 2 (A^ — 1) [25]. Thus, with equal power allocation, 
the downlink IC scheme also achieves a diversity gain of 2 (A^ — 1). 

When power is distributed to maximize the smaller of the output SNRs at each user, the resulting 
SNR can be expressed as SNR^'^-' = j7^^r§-)- From the definition of the optimization problem in (35), 
we always have ^,^^j^^^^ ^ > min {^,^)- Further, because of the inequality ^^^ < ^ = bk, we 
can obtain 4^7^ < ^mm{bi,b2). It follows that f min(bi,62) < SNR^^'^ < f min(bi,62)- This 
implies that SNR'^'^^ scales with Pmin (61, 62)- Note that both 61 and 62 have diversity gain 2(A^ — 1). 
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Then, min (61, 62) also has diversity gain 2(A^ — 1). Therefore, the system with optimal power allocation 
achieves diversity gain 2(A^ — 1). 
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TABLE I 

Classification of multi- antenna communication systems with respect to the channel information 



Category 


CSIT 


CSIR 


References 


System A 


No 


No 


[3]-[5] 


System B 


No 


Yes 


[6], [7] 


System C 
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Yes 


[8], [9] 


System D 
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No 
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Fig. 1. System diagram for two real systems. They are dual to each other under ZF designs. 
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Fig. 2. Interference cancellation for the two-user uplink MAC. 
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Fig. 3. Interference cancellation for the two-user downlink BC. 
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Fig. 4. Performance comparison in 2 x 2 MIMO systems; Dual Alamouti codes, Alamouti codes, differential transmission, and SVD 
methods. 
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Fig. 5. Performance comparison in two-user N x 2 BC systems at 1 bit/channel use/user: Downlink IC scheme (labeled as 'IC'), 
Downlink IC scheme using optimal power allocation (labeled as 'IC-PA'), BD methods, and opportunistic TDMA using dual Alamouti 
codes (labeled as 'TDMA-DA'). 
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Fig. 6. Performance comparison in two-user N x 2 MIMO systems at 2 bits/cliannel use/user: Downlink IC scheme (labeled as 'IC'), 
Downlink IC scheme using optimal power allocation (labeled as 'IC-PA'), BD methods, and opportunistic TDMA using dual Alamouti 
codes (labeled as 'TDMA-DA'). 
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